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1 Extended Multiplier 



1 J Purpose 

The purpose of this extended multiplier, is to build more functionality into the existing 
multiplier, to optimize it for the expected customer applications. This will be accomplished, by 
adding 2 additional multipliers per tile, and 4 additional adders. These will all be added to the 
current multiplier block, in essence creating a large functional unit that is capable of significant 
processing. 

In order to reduce the overhead in this effort, it has been decided to use the same input muxes 
and output muxes as the current multipliers. This means that we have a block that has 4 * 32-bit 
inputs and 2 * 32-bit outputs. Thus in order to take advantage of having 4 multipliers, the inputs 
somehow have to be shared between them, and the outputs need to be shared as well. The 
sharing of the inputs is generally accomplished by packing the inputs into the high and low 
halves of the input words. The output sharing is accomplished by either accumulating the results 
through the new adders, or packing the results into the output registers. 

Finally, for backward compatibility, we need to provide for the case where the new features are 
bypassed. This is accomplished by making it so that the input and output muxes are selected in 
the same fashion for the old bits, and the new bits (when defaulted to 0) do not effect the circuit. 
The opcode is also designed so that the default case of all 0's causes the mutlipliers to behave as 
before. 
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2 Implementation 

The XMULT or "extended multiplier" builds on the existing multiplier block by maintaining the 
existing I/O structure. It adds 2 additional multipliers and 4 adders to create a structure which 
can be configured as for a variety of different functions. A simplified diagram of the multiplier 
is illustrated below, along with a table of the target opcodes. The wiring pattern illustrates a 
backwards compatible mode. Components in bold illustrate the existing multiplier. Each adder or 
multiplier has 2 input muxes and an optional output register. 




opcode 


latency 


function 


2MULT 


2 


current multiplier mode 


4ADD32 


3 


sum of 4 32-bit inputs 


4ADD16 


4 


sum of 4 sets of packed 16 bit inputs 


4MULT 


2 


4 independent multipliers with 1 6-bit packed inputs and outputs 


4MULTSUM 


4 


4 multipliers with 16-bit packed inputs and outputs summed together in tree 


4MULT2SUM 


3 


4 multipliers with 16-bit packed inputs and outputs summed together in tree 


4FIR 


N/A 


4 multipliers with 16-Bit packed coefficients, a single 16-Bit data input, a 
32-bit accumulation input and a 3 2 -Bit outputs accumulated in cascade 
with pipeline registers between accumulators 


CMULT 


4 


16-Bit packed complex multiply with 32-Bit IQ accumulation input, output 


CMULT16 


3 


16-Bit packed complex multiply with FFT butterfly adders with complex 
input, output 
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OPERATOR MUX selects by OPCODE 



MUX 


n 


OPCODE 






2MULT 


4ADD32 


4ADD16 


4MULT 


4MULT+ 


4MULT2 
+ 


4FIR 


CMULT1 
6 


CMULT 






i0[31:16] 






i0[31:16] 


i0[31:16] 


i0[31:16] 


i2[15:0] 


i0[31:16] 


i0[31:16] 






i0[15:0] 


















mi iltn.K 
FT (UilU'LJ 




i1[31:8] 






11 [31: 16] 


i1[31:16] 


i1 [31:16] 


i1[31:16] 


i1[31:16] 


i1[31:16] 






i1[31:16] 






















i1[15:0] 


















mi i If "1 _ d 










i0[15:0] 


i0[15:0] 


i0[15:0l 


i2[15:0] 


i0[15:0] 


i0[15:0] 


mnlt1-h 

1 1 lull 1 u 










i1[15:0] 


M [15:0] 


i1[15:0] 


5:0] 


i1[15:0] 


M[15:0] 


mult9-s» 




i2[31:16] 






i2[31:16] 


i2[31:16] 


i2[31:16] 


i2[15:0] 


i0[31:16] 


i0[31:16] 






i2[15:0] 


















mnlt9-h 




i3[31:8] 






i3[31:16] 


i3[31:16] 


i3[31:16] 


i3[31:16] 


i1[15:0] 


i1[15:0] 






i3[31:16] 






















i3[15:0] 


















miilt3-a 

1 1 1 Ul Iwl CI 










i2[15:0] 


i2[15:0] 


i2[15:0] 


i2[15:0] 


i0[15:0] 


i0[1 5:0] 


1 1 IUILO U 










i3[15:0] 


i3[15:0] 


i3[1 5:0] 


i3[1 5:0] 


i1 [31:16] 


M[31:16] 


addO-a 

auuu a 


o 

c 




i0[31:0] 


i0[31:0] 




m0[31:0] 


m0[31:0] 


i0[31:0] 


m0[31 :0] 


m0[31:0] 


addO-b 






M [31:0] 


M [31 :0] 




m1[31:0] 


m1[31:0] 


m1[31:0] 


~m1 + 1 


-ml + 1 


add1-a 


3 




i2[31:0] 


i2[31:0] 




m2[31:0] 


m2[31 :0] 


a2[31 :0] 


m2[31 :0] 


m2[31:0] 


add1-b 


2 




i3[31:0] 


i3[31:0] 




m3[31:0] 


m3[31:0] 


m3[31 :0] 


m3[31 :0] 


m3[31:0] 


add2-a 


2 




a0[31:0] 


a0[31:0] 




a0[31:0] 




m0[31 :0] 




a0[31:0] 


add2-b 


3 




a1[31:0] 


a1[31:0] 




a1[31:0] 




a0[31:0] 




i2[31:0] 


add3-a 


3 






a2[31:16] 








a1[31:0] 


32'hO 


a1[31:0] 


add3-b 








a2[15:0] 








m2[31:0] 


i3[31 :0] 


i3[31:0] 


omuxO 


6 


m0[31 :0] 


a2[31:0] 


a2[31:0] 


m0[31:16] 


a2[31:0] 


a0[31:0] 




a0[31:16] 


a2[31:0] 












m1[31:16] 








a1 [31:16] 




omuxl 


6 


m2[31 :0] 


a2[Cout] 


a3[31:0] 


m2[31:16] 


a2[Cout] 


a1[31:0] 


a3[31:0] 


a3[31:0] 


a3[31:0] 












m3[31:16] 













i0-i3 = input mux 
m0-m3=mults 
a0-a3=adder 
-a = operand a 
-b= operand b 



Carry of 32-bit ADD operation is not brought out unless explicitly specified in this document. 
The precision of the ADD operation right after the multipliers is not lost due to the duplicate sign 
bit in the result of the multipliers. For any other additions, it the user's responsibility to avoid the 
event of a overflow. The user might also use the shift-down operation in the result of the adders 
to reduce the loss of precision. 
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2 JL Bit file encoding 

There are 10 bits in the CSMMULT that are not used. These are 'RESERVED' fields, as well as 
the multji Ism wen, and the multjh Ism dynamic mode bits. The wen is not used, as lsm3 is 
never used as a write Ism unless it is connected to at least one other Ism. The write enable is 
routed with the write address, and the multiplier cannot generate the write address. The dynamic 
mode bits are routed with the write data, and the multiplier cannot generate write data. Thus, 
neither of these fields is meaningful in the CSMMULT. 

The multiplier a input mux select is named muxafghsel, which is currently a 1 bit field. We will 
extend this to 2 bits for both mutlipliers, at the cost of 2 CSM bits. The output select is named 
muxmultlsmsel, which is currently a 2 bit field. We will extend this to 3 bits for both 
mutlipliers, at the cost of 2 CSM bits. We will add a 5 bit opcode, which will essentially be 
shared, which will therefore cost 5 CSM bits. One of the remaining 2 bits will be used to 
selectively shift all of the multiplier results up by one bits, thereby normalizing off the redundant 
sign bit. The other remaining bit will be used to selectively shift the adder outputs down by one 
bit, in order to normalize the adder results. Thus, all of the available CSMMULT bits will be 
utilized by the new design. 
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Opcodes (new 4 bit CSMM1 


LFLT field) 


Name 


Bit 4 


Bit 3 


Bit 2 


Bit 1 


BitO 


Multbypass[3:0] 


Adder s16bit[3:0l 


Adder bypassf3:01 


2MULT 


0 


0 


0 


0 


0 


x1x1 


Xxxx 


xxxx 


4ADD32 


0 


0 


1 


0 


0 


xxxx 


xOOO 


1100 


4ADD16 


0 


0 


1 


0 


1 


xxxx 


1111 


1000 


4MULT 


0 




0 


0 


0 


0000 


Xxxx 


xxxx 


4MULTSUM 


0 




0 


0 


1 


0000 


xOOO 


x100 


4MULT2SUM 


0 




0 


1 


0 


0000 


xxOO 


1111 


4FIR 


0 




1 


0 


0 


0000 


0000 


0000 


CMULT 


0 




1 


1 


0 


0000 


0000 


1100 


CMULT16 


0 




1 


1 


1 


0000 


0x11 


0x11 



Input Muxes (2 new CSMMULT bits) 



Mux / Select 


0 


1 


2 


3 


multO-a 


i0[15:0l 


i0[31:16] 


i2[15:0] 


16'hO 


multO-b 


i1 [15:01 


M[31:16] 


11 [31:8] 


24'hO 


mult1-a (multO-a) 


i0[31:16l 


i0[1 5:0] 


i2[15:0] 


16'hO 


multt-b (multO-b) 


i1[31:16] 


i1[15:0] 


i1[31:16] 


16'hO 


mult2-a 


16'hO 


i2[31:16] 


i0[31:16] 


i2[15:0] 


mult2-b 


16'hO 


i3[31:16] 


i3[31:8] 


i1[15:0] 


mult3-a (mult2-a) 


16'hO 


i2[15:0] 


I0[15:0] 


i2[15:0] 


mult3-b (mult2-b) 


16'hO 


i3[1 5:0] 


16'hO 


i1[31:16] 



Note that the multl and mult3 a and b operand muxes are derived from the multO and mult2 a 
and b operand muxes respectively. For the 16 bit high low selects, it is useful to note that 
selecting the high part of the word for multO selects the low part of the same word for multO. 
This is true for the low bits of all of the select fields, but the high bits are reserved for the more 
special cases, such as 24*16 multiplies, the FIR opcode, and the CMULT opcode inputs. 



Mux input for adders (controls are c 


lecoded by opcodes) 


Mux / Select 


0 


1 


2 


3 


addO-a 


i0r31:0l 


m0[31:0l 


32'hO 


desp0[31:0l 


addO-b 


i1 [31:01 


ml [31:01 


-ml [31:0]+1 


32'hO 


add1-a 


i2f31:0l 


m2[31:0] 


a2[31:0] 


despl [31:01 


add1-b 


i3[31:16l 


m3[31:0l 


32'hO 


32'hO 


add2-a 


a0[31 :0] 


m0[31:0l 


32'hO 


desp2[31:0l 


add2-b 


a1[31:0] 


a0[31:0] 


i2[31 :0] 


32'hO 


add3-a 


a2[31:16l 


a1 [31:01 


32'hO 


desp3[31:0l 


add3-b 


a2[15:0] 


m2[31:0l 


i3[31 :01 


32'hO 



Output Muxes (2 new CSMMULT bits) 



Mux / Select 


0 


1 


2 


3 


4 


5 


6 


7 


omuxO (mult-h) 


m0[31:0] 


Ismval 


i0[31:0] 


i1[31:0] 


a2[31 :0] 


{m0,m1} 


a0[31:0] 


{aO t a1} 


omuxl (mult-l) 


m2[31:01 


Ismval 


i2[31:0] 


i3[31:0] 


a2[cout] 


{m2,m3} 


a1[31:0] 


a3[31:0] 
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Mult Shift Up (1 new CSMMULT bit) 

Selectively causes the output of the multipliers to be shifted up by one bit for normalization. The 
LSB is then connected to Gnd. 

Adder Shift Down (1 new CSMMULT bit) 

Selectively causes the output of the adders to be shifted down by one bit for normalization. 

Input Mux (1 new population on the interconnect input mux) 

The current input mux supports the constants 0 and -1. It is proposed to add the constant 
0x00010001 to allow selective multiplication by 0, 1 and -1. 



2.2 Layout Floorplan 



InputmuxO 



Inputmuxl 



Multmux2 



Multmux3 



Mult2 



Mult3 



MultmuxO 



Multmuxl 



MultO 



Multl 



AdderO 



Adderl 



Adder2 



Adder3 



OutputmuxO 



Outputmuxl 
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2MULT - CS2112 Compatible mode 2 independent multipliers 



oO 



ol 



inpO 



OmuxO 



multO 




**1 mult2 



mult3 

\ *.J-I 




add2 



add3 



Omuxl 
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4ADD32 - Sum of 4 32-bit inputs 
-f 



a 
b 



+ 




+ 



+ 



+ 




+ 



oO 
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inpO 









multO 










1 





inpl 



inp2 



inp3 



OmuxO 









mult2 










1 
















mult3 


1 

























multl 


1 
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4ADD16 - Sum of 4 packed 16-bit inputs, sum of upper, lower 16-bits 



a 
b 

c 
d 




-J— oO 



H 




inpO 



inpl 



inp2 



inp3 



multO 



multl 



mult2 

D3i 



mult3 



0®' 



addl 





addO 























add2 
- — 




H 



OmuxO 



Omuxl 



add3 

33 
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4MULT - 4 multipliers with pack 16-bit inputs 
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a 
b 



c 
d 



"VT*^ 1 oOH 

Tf^)— f— oOL 



inpO 



inpl 



inp2 



jp2 



inp3 



H 



H 



multO 





H 



H 



multl 
* 

mult2 
mult3 





OmuxO 



mult2f3 1:161 



mult2F3 1:161 



OregO 



OmuxO 



mult2[31:16] 



mult3[31:16] 



+ 



Oregl 
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a 
b 



c 
d 
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inp2 



inp3 
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multO 
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a 
b 



c 
d 



l0-|- 

'<•>+ 



70-+ 





oo, 



01 



inpO 



OmuxO 



inpl 



inp2 



inp3 



iH 



H 



multO 
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CMULT16 - Complex Multiplier with 16-Bit Packed data, and indepent delay path. 
Assumes real part in High 16-bits, imaginary in Low 16-bits 



a 
b 



H 



H 




JJ7} 




H 




oo 



+ 



Ol 



inpO 



OmuxO 



inpl 



inp2 



inp3 



H 



H 



H 



H 




Omuxl 
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CMULT - 32-bit output complex multiply with 32-Bit accumulation input, Assumes real 
part in High 16-bits, imaginary in Low 16-bits 



a + 
b -f 



H 



H 




H 



H 



0+ 






+.) 1— oO 



+ )— I" ol 



inpO 



inpl 



inp2 



inp3 



H 



H 




multO 
multl 



H 



H 







addO 

















add2 




+ 





addl 























add3 




+ 



OmuxO 



Omuxl 
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InputmuxO 



Inputmux 1 



Multmux2 



Multmux3 



Mult2 



Mult3 



MultmuxO 



Multmuxl 



MultO 



Multl 



Desp/Corr Tree 0 



Desp/Corr Tree 1 



Desp/Corr Tree 2 



Desp/Corr Tree 3 



AdderO 



Adderl 



Adder2 



Adder3 



OutputmuxO 



Outputmuxl 
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